Making Complex Prediction Rules Applicable for Readers: Current Practice in Random Forest Literature and Recommendations

نویسندگان

  • Anne-Laure Boulesteix
  • Silke Janitza
  • Roman Hornung
  • Philipp Probst
  • Hannah Busen
  • Alexander Hapfelmeier
چکیده

Ideally, prediction rules (including classifiers as a special case) should be published in such a way that readers may apply them, for example to make predictions for their own data. While this is straightforward for simple prediction rules, such as those based on the logistic regression model, this is much more difficult for complex prediction rules derived by machine learning tools. We conducted a survey of articles reporting prediction rules that were constructed using the random forest algorithm and published in PLOS ONE in 2014-2015 with the aim to identify issues related to their applicability. The presented prediction rules were applicable in only 2 of 30 identified papers, while for further 8 prediction rules it was possible to obtain the necessary information by contacting the authors. Various problems, such as non-response of the authors, hampered the applicability of prediction rules in the other cases. Based on our experiences from the survey, we formulate a set of recommendations for authors publishing complex prediction rules to ensure their applicability for readers.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Predicting the Next State of Traffic by Data Mining Classification Techniques

Traffic prediction systems can play an essential role in intelligent transportation systems (ITS). Prediction and patterns comprehensibility of traffic characteristic parameters such as average speed, flow, and travel time could be beneficiary both in advanced traveler information systems (ATIS) and in ITS traffic control systems. However, due to their complex nonlinear patterns, these systems ...

متن کامل

Prediction of maximum surface settlement caused by earth pressure balance shield tunneling using random forest

Due to urbanization and population increase, need for metro tunnels, has been considerably increased in urban areas. Estimating the surface settlement caused by tunnel excavation is an important task especially where the tunnels are excavated in urban areas or beneath important structures. Many models have been established for this purpose by extracting the relationship between the settlement a...

متن کامل

Application of ensemble learning techniques to model the atmospheric concentration of SO2

In view of pollution prediction modeling, the study adopts homogenous (random forest, bagging, and additive regression) and heterogeneous (voting) ensemble classifiers to predict the atmospheric concentration of Sulphur dioxide. For model validation, results were compared against widely known single base classifiers such as support vector machine, multilayer perceptron, linear regression and re...

متن کامل

Predicting stock prices on the Tehran Stock Exchange by a new hybridization of Fuzzy Inference System and Fuzzy Imperialist Competitive Algorithm

Investing on the stock exchange, as one of the financial resources, has always been a favorite among many investors. Today, one of the areas, where the prediction is its particular importance issue, is financial area, especially stock exchanges. The main objective of the markets is the future trend prices prediction in order to adopt a suitable strategy for buying or selling. In general, an inv...

متن کامل

Costs of treatment after renal transplantation: is it worth to pay more?

Objectives: The present study aimed to provide an estimation of the current financial burden of renal transplantation therapy for insurance organisations.Methods: An Excel-based model was developed to determine the treatment costs of current clinical practice in renal transplantation therapy (RTT). Inputs were derived from Ministry of Health and insurance organizations` database, hospital and p...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016